73 research outputs found
DCTM: Discrete-Continuous Transformation Matching for Semantic Flow
Techniques for dense semantic correspondence have provided limited ability to
deal with the geometric variations that commonly exist between semantically
similar images. While variations due to scale and rotation have been examined,
there lack practical solutions for more complex deformations such as affine
transformations because of the tremendous size of the associated solution
space. To address this problem, we present a discrete-continuous transformation
matching (DCTM) framework where dense affine transformation fields are inferred
through a discrete label optimization in which the labels are iteratively
updated via continuous regularization. In this way, our approach draws
solutions from the continuous space of affine transformations in a manner that
can be computed efficiently through constant-time edge-aware filtering and a
proposed affine-varying CNN-based descriptor. Experimental results show that
this model outperforms the state-of-the-art methods for dense semantic
correspondence on various benchmarks
Memory-guided Image De-raining Using Time-Lapse Data
This paper addresses the problem of single image de-raining, that is, the
task of recovering clean and rain-free background scenes from a single image
obscured by a rainy artifact. Although recent advances adopt real-world
time-lapse data to overcome the need for paired rain-clean images, they are
limited to fully exploit the time-lapse data. The main cause is that, in terms
of network architectures, they could not capture long-term rain streak
information in the time-lapse data during training owing to the lack of memory
components. To address this problem, we propose a novel network architecture
based on a memory network that explicitly helps to capture long-term rain
streak information in the time-lapse data. Our network comprises the
encoder-decoder networks and a memory network. The features extracted from the
encoder are read and updated in the memory network that contains several memory
items to store rain streak-aware feature representations. With the read/update
operation, the memory network retrieves relevant memory items in terms of the
queries, enabling the memory items to represent the various rain streaks
included in the time-lapse data. To boost the discriminative power of memory
features, we also present a novel background selective whitening (BSW) loss for
capturing only rain streak information in the memory network by erasing the
background information. Experimental results on standard benchmarks demonstrate
the effectiveness and superiority of our approach
Semantic-aware Network for Aerial-to-Ground Image Synthesis
Aerial-to-ground image synthesis is an emerging and challenging problem that
aims to synthesize a ground image from an aerial image. Due to the highly
different layout and object representation between the aerial and ground
images, existing approaches usually fail to transfer the components of the
aerial scene into the ground scene. In this paper, we propose a novel framework
to explore the challenges by imposing enhanced structural alignment and
semantic awareness. We introduce a novel semantic-attentive feature
transformation module that allows to reconstruct the complex geographic
structures by aligning the aerial feature to the ground layout. Furthermore, we
propose semantic-aware loss functions by leveraging a pre-trained segmentation
network. The network is enforced to synthesize realistic objects across various
classes by separately calculating losses for different classes and balancing
them. Extensive experiments including comparisons with previous methods and
ablation studies show the effectiveness of the proposed framework both
qualitatively and quantitatively.Comment: ICIP 2021. Code is available at https://github.com/jinhyunj/SANe
Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning
Compositional zero-shot learning (CZSL) aims to recognize unseen compositions
with prior knowledge of known primitives (attribute and object). Previous works
for CZSL often suffer from grasping the contextuality between attribute and
object, as well as the discriminability of visual features, and the long-tailed
distribution of real-world compositional data. We propose a simple and scalable
framework called Composition Transformer (CoT) to address these issues. CoT
employs object and attribute experts in distinctive manners to generate
representative embeddings, using the visual network hierarchically. The object
expert extracts representative object embeddings from the final layer in a
bottom-up manner, while the attribute expert makes attribute embeddings in a
top-down manner with a proposed object-guided attention module that models
contextuality explicitly. To remedy biased prediction caused by imbalanced data
distribution, we develop a simple minority attribute augmentation (MAA) that
synthesizes virtual samples by mixing two images and oversampling minority
attribute classes. Our method achieves SoTA performance on several benchmarks,
including MIT-States, C-GQA, and VAW-CZSL. We also demonstrate the
effectiveness of CoT in improving visual discrimination and addressing the
model bias from the imbalanced data distribution. The code is available at
https://github.com/HanjaeKim98/CoT.Comment: ICCV 202
- …